-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: Add OpenAI Compatible embedder for codebase indexing #4066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add OpenAI Compatible embedder for codebase indexing #4066
Conversation
- Implement OpenAiCompatibleEmbedder with batching and retry logic - Add configuration support for base URL and API key - Update UI with provider selection and input fields - Add comprehensive test coverage - Support for all OpenAI-compatible endpoints (LiteLLM, LMStudio, Ollama, etc.) - Add internationalization for 17 languages
- Fix field count expectations (4 fields including Qdrant) - Use specific test IDs for button selection - Fix input handling with clear() before type() - Use toHaveBeenLastCalledWith for better assertions - Fix status text matching with regex pattern
- Remove unused waitFor import to fix ESLint error - Fix test expectations to match actual component behavior for input fields - Simplify provider selection test by removing complex mock interactions - All CodeIndexSettings tests now pass (20/20)
|
Hi @SannidhyaSah, thanks for this contribution! The ability to connect to various OpenAI-compatible endpoints is a great addition. Regarding the introduction of the The new We could potentially modify This doesn't mean your implementation is wrong. However, it's an architectural consideration that might lead to a slightly more streamlined codebase in the long run. What are your thoughts on this? |
@daniel-lxs I see the benefit of your proposed change in reducing the scope of modifications. My primary concern, though, is the potential for user confusion if we embed this within the OpenAI embedder. Given that the OpenAI compatible provider class functions as a universal proxy for other providers, not able to distinguishing between the two could become a UX nightmare. That said, I've gone ahead and implemented your suggestion to see it in practice. If you're confident that the user confusion risk is acceptable, I'm ready to push these changes. |
|
@SannidhyaSah Even though the provider is OpenRouter the user is able to set their own base URL. Let me know what you think. |
@daniel-lxs I appreciate your insight on how identical models might reduce immediate confusion, and I certainly see the logic in that perspective. However, my concern about potential user confusion in the long run remains, especially given the broader and evolving landscape of embedding models and the distinct, crucial role our OpenAICompatible provider already plays. As you mentioned this will come in handy when new provider allows the user to fetch or add new models , but alredy exist .. For instance, other providers like Mistral AI not only offer their own embedding models, as seen here: https://docs.mistral.ai/capabilities/embeddings/code_embeddings/, but they also frequently provide free access to their APIs, which is often a strong preference for many users. Beyond that, our OpenAICompatible endpoints are incredibly valuable for users connecting to local model hosting solutions like LM Studio and Ollama. For those using Open Web UI with Ollama to host their models on a VM, this integration is a real lifesaver. This makes the OpenAICompatible method a truly essential way for users to interact with a wide array of providers and their self-hosted models. Looking further ahead, maintaining this as a completely separate class for OpenAICompatible providers will actually significantly reduce the need to add any other new embedder classes in the future, as it's designed to be a flexible gateway. Furthermore, having it as a distinct class offers a critical security and stability benefit: if we ever need to adapt or modify something specifically for these external or compatible models, we only need to work within the OpenAICompatible class. This approach keeps the main OpenAIEmbedder class undisturbed, ensuring it remains the most secure and stable option for direct OpenAI integrations. Therefore, maintaining this clear architectural separation between the OpenAI embedder and the broader OpenAICompatible provider helps ensure users don't get confused as they navigate various embedding options, both now and in the future, while also streamlining our maintenance and future development efforts. I can add a logic to fetch the models dynamically . That will resolve most of the future problems we might face here. Let me know your thoughts so that I can proceed accordingly. |
|
@SannidhyaSah I just don't think adding another provider to just have these same models in the list would be too useful for the users. Let me know what you think. |
|
oh i see the confusion here . |
|
@SannidhyaSah Yes! That was my idea, if we can somehow dynamically list them then it totally makes sense to have a separate provider for it. |
|
Yeah what we do in the OpenAI Compatible provider in general is to try to list the models using /v1/models, but to also let the user enter their own model name in case that listing endpoint doesn't work. For what it's worth one use case I heard about recently is people who want/need to use LiteLLM to proxy their model calls. In this case it's the same models, but through a proxy to meet their company's requirements. |
- Add manual model ID and embedding dimension configuration - Enable custom model input via text field in settings UI - Add modelDimension parameter to OpenAiCompatibleEmbedder - Update configuration management to persist dimension setting - Prioritize manual dimension over hardcoded model profiles - Add comprehensive test coverage for new functionality This allows users to specify any custom embedding model and its dimension for OpenAI-compatible providers, removing dependency on hardcoded model profiles.
…gs in all locales
…eEmbedder - Remove modelDimension property and constructor parameter from OpenAiCompatibleEmbedder class - Update ServiceFactory to not pass dimension to embedder constructor - Update tests to match new constructor signature - The dimension is still used for QdrantVectorStore configuration
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
|
Testing on LM Studio OpenAI Provider for Embeddings Testing.pdf Apologies for the PDF, I am still trying to figure out getting MD files out of Obsidian easily |
|
Hey @adamhill, We are using the OpenAI client to create the embeddings, it seems like LM Studio is not 100% compatible if it's failing like this. I tested with OpenAI and it works without any issues. In that case is probably better to wait for the LM Studio provider. |
|
FYI. Per the OpenAPI specs https://platform.openai.com/docs/api-reference/embeddings/create Edit: cURL works against LMS
curl http://10.5.1.251:1234/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "The food was delicious and the waiter...",
"model": "nomic-embed-code",
"encoding_format": "float"
}'
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
0.02144342102110386,
-0.010139767080545425,
-0.01469159685075283,
....LMS Logs |
|
@adamhill I'm not sure what's going on with LM Studio but we basically are just using the official OpenAI client, this PR only passes a different base URL. |




Related GitHub Issue
Closes: #4065
Description
This PR implements OpenAI Compatible embedder support for codebase indexing, enabling users to connect to any OpenAI-compatible API endpoint (LiteLLM, LMStudio, Ollama, etc.) for generating embeddings.
Key implementation details:
OpenAiCompatibleEmbedderwith intelligent batching and retry logicDesign choices:
Test Procedure
Automated Testing:
pnpm test- All 20+ new unit tests passpnpm lint- No linting errorspnpm check-types- All TypeScript types validManual Testing:
Type of Change
Pre-Submission Checklist
npm run lint).console.log) has been removed.npm test).mainbranch.npm run changesetif this PR includes user-facing changes or dependency updates.Documentation Updates
Additional Notes
This implementation supports all major OpenAI-compatible providers including:
The feature is fully backward compatible and doesn't affect existing embedder configurations.
Important
Adds support for OpenAI-compatible embedders in codebase indexing with new embedder class, configuration, UI integration, and comprehensive testing.
OpenAiCompatibleEmbedderclass for OpenAI-compatible API endpoints with batching and retry logic.codebase-index.tsto includeopenai-compatibleas an embedder provider.CodeIndexSettings.tsxto support dynamic provider selection.config-manager.ts.provider-settings.tsandglobal-settings.tsfor new configuration keys.openai-compatible.test.tsandconfig-manager.test.ts.service-factory.test.ts.OpenAiCompatibleEmbeddertoservice-factory.tsfor embedder creation.embeddingModels.tsto includeopenai-compatiblemodels.This description was created by
for 9755fd7. You can customize this summary. It will automatically update as commits are pushed.